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NONPARAMETRIC ESTIMATION OVER SHRINKING 
NEIGHBORHOODS: SUPEREFFICIENCY AND ADAPTATION 1 

By T. Tony Cai and Mark G. Low 

University of Pennsylvania 

A theory of superefficiency and adaptation is developed under 
flexible performance measures which give a multiresolution view of 
risk and bridge the gap between pointwise and global estimation. 
This theory provides a useful benchmark for the evaluation of spa- 
tially adaptive estimators and shows that the possible degree of su- 
perefficiency for minimax rate optimal estimators critically depends 
on the size of the neighborhood over which the risk is measured. 

Wavelet procedures are given which adapt rate optimally for given 
shrinking neighborhoods including the extreme cases of mean squared 
error at a point and mean integrated squared error over the whole 
interval. These adaptive procedures are based on a new wavelet block 
thresholding scheme which combines both the commonly used hori- 
zontal blocking of wavelet coefficients (at the same resolution level) 
and vertical blocking of coefficients (across different resolution levels) . 

1. Introduction. Squared error loss at each point and integrated squared 
error loss over an interval are two of the most common ways to evaluate the 
performance of nonparametric function estimators. Integrated squared error 
is used as a broad overall measure of loss whereas pointwise squared error loss 
gives a highly localized measure of accuracy. Minimax theory for both these 
cases can be found for example in Pinsker (1980), Ibragimov and Hasminski 
(1984), Donoho and Liu (1991) and Donoho and Johnstone (1998), and there 
are a large number of additional references in Efromovich (1999). 

In nonparametric function estimation problems minimax risk provides a 
useful uniform benchmark for the comparison of estimators. Such uniform 
bounds do not, however, capture many aspects of these problems since in 
these infinite-dimensional settings asymptotically minimax estimators can 
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often be constructed which are also superefficient at every parameter point. 
In fact, much recent work on nonparametric function estimation can be 
viewed as attempts to construct superefficient estimators with desirable 
properties. This is clear in the literature on adaptive estimation where the 
connection between superefficiency and adaptation has been considered as, 
for example, in Beran (1999, 2000). In adaptive estimation the goal is to 
construct estimators which are simultaneously asymptotically (near) mini- 
max over a collection of parameter spaces. Such estimators are optimal over 
this range of spaces. 

This theory of adaptive estimation depends strongly on how risk is mea- 
sured. When the performance is measured globally full adaptation can often 
be achieved. In particular, Efromovich and Pinsker (1984) constructed fully 
adaptive estimators over a range of Sobolev spaces. Recent results on rate 
adaptive estimators focus on more general Besov spaces. See, for example, 
Donoho and Johnstone (1995), Cai (1999) and Hardle, Kerkyacharian, Pi- 
card and Tsybakov (1998). 

When the performance is measured at a point, it is often the case that full 
adaptation is not possible and superefficient estimators must have inflated 
risk at other parameter points. A penalty, usually a logarithmic factor, must 
be paid for not knowing the smoothness. Important work in this area be- 
gan with Lepski (1990) where attention focused on a collection of Lipschitz 
classes. See also Brown and Low (1996), Efromovich and Low (1994), Lepski 
and Spokoiny (1997) and Cai (2003). 

Since optimally adaptive estimators at each point typically pay a log- 
arithmic penalty compared to the minimax risk, they are not necessarily 
optimally globally adaptive. This has led to the approach of a simultane- 
ous pointwise and global analysis. The goal is then to construct estimators 
which, for a range of parameter spaces, are both minimax rate optimal for 
integrated squared error loss and pay only a logarithmic penalty for squared 
error loss at each point. See, for example, Cai (1999, 2002) and Efromovich 
(2002). 

Pointwise mean squared error can be viewed as an extreme (although 
useful) way of measuring local performance of an estimator f n . The focus in 
the present paper is on a more flexible approach. Specifically we propose to 
evaluate the performance of an estimator f n (near xq) by using an average 
mean squared error over a neighborhood of xq: 

1 rx +c n A 2 

(1) R(fn,f;x ,c n ) = —E f (f n (x) - f(x)) dx. 

^C-n J Xo - Cn 

The choice of c n allows for considerable flexibility when measuring local 
performance. For fixed n, by taking the limit as c n — > we can recover the 
usual case of squared error loss at xq , and by taking xq = I and c n = \ we 
recover the usual global risk. By evaluating the performance for a whole 
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range of c n it is possible to give a multiresolution view of the risk. We show 
that this more flexible approach to measuring local performance can be used 
to bridge the gap between the pointwise and global theories. 

In this paper we consider estimation over shrinking neighborhoods based 
on observations from a Gaussian process 

(2) Z*(t)= [ f(x)dx+-^B*(t), 0<t<l, 

Jo V n 

where B*(t) is a standard Brownian motion and / is an unknown func- 
tion. This Gaussian process is a prototypical model for many nonparametric 
function estimation problems such as nonparametric regression and density 
estimation. 

In Section 2 it is shown that the size of the neighborhood as governed by c n 
determines both the possible degree of superefficiency for minimax rate opti- 
mal estimators as well as the cost of adaptation. For "small" neighborhoods 
superefhcient estimators cannot be minimax rate optimal and hence fully 
rate adaptive estimation is not possible. In fact the penalty for supereffi- 
ciency determines the cost of adaptation. On the other hand, for "large" 
neighborhoods there exist minimax rate optimal estimators which are su- 
perefhcient at every parameter point. 

Adaptive estimation is considered in Sections 3 and 4. In Section 3 a 
procedure is constructed which optimally adapts to smoothness over given 
shrinking neighborhoods. This construction includes the extreme cases of 
mean squared error at a point and mean integrated squared error over the 
whole interval. 

The adaptive procedure used in Section 3 is based on block threshold- 
ing of empirical wavelet coefficients, a technique which has been shown to 
be effective for adaptive estimation. See, for example, Hall, Kerkycharian 
and Picard (1998) and Cai (1999, 2002). Block thresholding in these pa- 
pers is done by blocking of wavelet coefficients only at the same resolution 
level. The adaptive procedure proposed here is based on a new block thresh- 
olding scheme. It combines both the commonly used horizontal blocking 
of wavelet coefficients (at the same resolution level) and vertical blocking 
of coefficients (across different resolution levels). Furthermore, it appears 
that vertical blocking is essential for the resulting estimator to be optimally 
adaptive. 

The theory of adaptive estimation over given shrinking neighborhoods de- 
veloped in Sections 2 and 3 provides a useful benchmark for the evaluation 
of estimators designed to be spatially adaptive. Spatially adaptive proce- 
dures should however adapt not just to unknown smoothness but also to 
a whole range of shrinking neighborhoods over which the risk is measured. 
This more complete analysis incorporating a multi-resolution view of risk 
is given in Section 4. In that section it is shown that a block thresholding 
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estimator introduced in Cai (1999) exhibits, from this point of view, good 
spatial adaptivity. 

2. Superefficiency and adaptation. In nonparametric function estima- 
tion problems minimax risk depends strongly on the parameter space. Typ- 
ically the parameter space is unknown and so attention is often focused on 
the construction of adaptive estimators which simultaneously attain near 
minimaxity over a collection of parameter spaces. The theory of adaptive 
estimators is closely connected to that of super efficient estimators which in 
turn depend on how the risk is measured. 

In this paper we shall develop the shrinking neighborhood theory for 
Holder classes 

(3) F(a,M) = {/ : \fW(x) - f ik) (y)\ < M\x - y\ a ~ k ^< x<y<l}, 

where k is the greatest integer strictly less than a. Minimax theory in this 
setup is standard. In particular, under the risk measure (1) with observations 
from the Gaussian process (2) the minimax rate of convergence over F(a, M) 
is of order n - 2 "/( 2Q + 1 ). The theory for superefficiency and adaptation is 
however quite interesting. 

The focus in this section is on how the size of the shrinking neighborhood 
affects the penalty for super efficient estimators. The connection between 
super efficient estimation and adaptation is then made clear. Our interest 
in superefficiency is mainly for the insight it provides for the question of 
adaptation and we show how lower bounds derived for the penalty of super- 
efficiency are directly applicable to the minimum cost of adaptation. 

2.1. Superefficiency. For a parameter space T we call an estimator f n 
super efficient at / € T under a loss function L n if the risk at / converges 
faster than the minimax risk, namely 

(4) E f L n (f n ,f) _^ Q 

inf /n su Pfer E f L n(fn, f) 

As mentioned in the Introduction, for estimation under mean integrated 
squared error (i.e., x$ = i and c n = ^) fully rate adaptive estimators exist 
and so there are super efficient estimators which are also minimax rate opti- 
mal. In particular, Brown, Low and Zhao (1997) give examples of estimating 
the whole function under integrated squared error loss where an estimator is 
super efficient at every parameter while also maintaining the minimax rate 
of convergence. On the other hand, for estimation under pointwise mean 
squared error (c n = 0) Lepski (1990) and Brown and Low (1996) showed that 
any superefficient estimator cannot be minimax rate optimal over F(a, M) 
and hence in this case fully rate optimal adaptation is not possible. This case 
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is similar to the superefficiency phenomenon arising in regular parametric 
problems. See, for example, Le Cam (1953) and Lehmann (1983). 

As argued in the Introduction, integrated squared error and pointwise 
squared error are two extremes of a whole range of risk measures, each of 
which sheds light on the performance of a particular estimator f n . Shrinking 
neighborhoods give a more general way to evaluate the performance of an 
estimator. We begin by exploring the minimal cost of superefficiency for a 
specified shrinking neighborhood and find the critical size of neighborhood 
which will allow for the construction of superefficient estimators which are 
also minimax rate optimal. 

For a given shrinking neighborhood of xq let A(/o) be the collection of 
estimators f n based on the Gaussian observations (2) that are superefficient 
at rate B n at the parameter point /o- More specifically, let 

(5) A(/ ) = [fn : limsupn 2a /( 1+2a ) B n R(f n , fa x Q , c n ) <ool. 

The following result then precisely quantifies the minimum penalty of such 
superefficient estimators. 

Theorem 1. Fix < x < 1, < M' < M, and set c n = d n n~y i - 1+2a ^ . 
Let B n — > co and > oo and suppose that fo G F(a, M'). 

(i) // limsup^ d n ■ (log J B n )- 1 /(i+2a) = Qj then for any j n G A(/o)j 

/ n \2q/(1+2q) 

(6) liminf — sup R(f n , f;x ,c n ) > 

n^oo \\ogB n J feF(a,M) 

and there exists some f n € A(/o) satisfying 

( n \W(i+2q) 

(7) limsup — sup R(f n ,f;x ,c n )<oo. 

n-^oo \10g±f n / feF(a,M) 

(ii) J/liminf„^ 00 d n -(log J B n )~ 1 /( 1+2a ) >0 and limsup^^ d n x (log B n )~ l 
0, then for any f n G A(/ ), 

(8) liminf n 2Q /( 1+2Q ) • -\- sup R(f n , /; x , c n ) > 
and there exists some f n G A(/o) satisfying 

(9) limsup „W(l+2a) f . ^ ^ < ^ 

ra->oo log O n f£F(a,M) 
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(iii) If liminf. 
satisfying 



rwoo logB , 



"n 



> 0, then there exists an estimator f n £ A(/o) 



(10) 



lim sup n 



2o/(l+2a) 



sup R(f n ,f;x ,c, 

feF(a,M) 



) < 00. 



71 



n— >oo 



Note that the rate in the upper bound in case (iii) is sharp because it is 
also the minimax rate of convergence. 

Theorem 1 gives bounds on the maximum risk after prespecifying the de- 
gree of super efficiency. For each of the three cases the proof of Theorem 1 
constructs specific wavelet block thresholding procedures which attain the 
lower bounds. In other words, these wavelet procedures have minimal max- 
imum risk given a particular level of super efficiency at a specified function. 

Alternatively, it is also useful to classify the existence of minimax super- 
efficient estimators in terms of a given neighborhood. The results can then 
be conveniently summarized as follows. 

Case 1 (Small neighborhoods). When the size of the neighborhood is 
smaller than Dn _1 ^ 2a+1 ' (i.e., < d n < D) for some constant D, no mini- 
max rate optimal estimator can be superefficient. In particular, when d n = 0, 
which corresponds to the usual pointwise risk at xq, superefficient estima- 
tors cannot be minimax rate optimal. In other words, minimax rate optimal 
estimators must have the same "flat" rate of convergence at every / in the 
interior of F(a,M). 

Case 2 (Large neighborhoods). When the size of the neighborhood sat- 
isfies liminf d n = oo there are superefficient estimators attaining the mini- 
max rate. The possible degree of super efficiency of a minimax rate optimal 
estimator however depends on the size of the neighborhood as described in 
the following three cases. 

Case A. liminfn^oo d n = oo and limsup n ^ 00 = 0. In this case a min- 
imax rate optimal estimator can be superefficient at fo, but the rate of con- 
vergence of its risk at fo cannot be algebraically faster than the minimax 
rate. 

Case B. < liminf^oo < limsup^^.^ < A < oo. In this case 
an estimator can have risk at fo converging at an algebraic rate faster 
than the minimax rate while maintaining the minimax convergence rate 
over F(a,M). 
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Case C. liminfV^oo = oo. In this case a minimax rate optimal esti- 
mator can have its risk at /o converging at a rate which is faster than any 
algebraic rate. Hence an estimator can achieve a high degree of supereffi- 
ciency at /o without paying a penalty in terms of its maximum risk over 
F(a,M). 

An interesting consequence of these results is that for a prespecified 
shrinking neighborhood of size n~ 7 superefhcient estimators which are also 
minimax rate optimal exist for F(a, M) if and only if < a < h^-. In par- 
ticular, for 7 > 1 there are no minimax superefhcient estimators over any 
Holder class F(a, M) and for < 7 < 1 superefhcient minimax rate optimal 
estimators exist only for the less smooth function spaces. 

2.2. Superefficiency in global estimation. An interesting special case of 
the results considered in the previous section is that of estimation under 
mean integrated squared error which corresponds to the choice of xq = | 
and c n = \. In this case the results of Theorem 1 show that an estimator 
can simultaneously attain the minimax rate over F(a, M) and a high degree 
of superefficiency at any specific /o in the interior of F(a, M). The following 
corollary of Theorem 1 precisely quantifies how superefhcient the estimator 
can be while maintaining the minimax rate of convergence over F(a,M). 

Corollary 1 . Let < M' < M and f G F(a, M') . Suppose 

(11) limsupra 1/(1+2a) • (log.B,,)" 1 =0. 

n— »oo 

If f n is an estimator based on (2) satisfying 

(12) limsup B n E fo \\f n - f Q \\l < oo, 

then 

(13) limsupn 2a /( 1+2a ) sup E f \\f n - f\\j = 00. 

n^co f£F(a,M) 

Thus, a minimax rate optimal estimator cannot have risk at /o converging 
faster than e -^ 1/(1+2a) for all D > 0. 

Condition (11) is sharp. That is, there exist estimators which converge 
super-fast at any hxed /o £ F(a, M) with the rate of e~ Dnl (1+2q) an d yet 
still attain the minimax rate uniformly over the class F(a,M). 

Theorem 2. Let f e F(a,M) be fixed. For any constant D > there 
exists an estimator which satisfies 

(14) lhnsupe Dnl/{1+2a) E f0 \\f n - f \\j < 00 

n— >oo 
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and 

(15) limsupn 2a /( 1+2a ) sup E f \\f n - f\\ 2 2 < oo. 

n-»oo feF(a,M) 

The theorem guarantees the existence of such superefficient estimators. 
One such estimator based on block thresholding of empirical wavelet coeffi- 
cients is given by (63) and (64) in Section 6. 

2.3. Connection to adaptation. The results on superefficiency given in 
Section 2.1 have direct implications for adaptation. Consider two function 
classes F(a±,M) and F(a 2 ,M) with < cti < a 2 < 1- Then F(a 2 ,M) C 
F(ai,M) and a fully rate adaptive estimator /„ over these classes would 
need to satisfy 

(16) sup R(f n J;xo,c n )~n- 2a ^ 2a >+V 

feF( ai ,M) 

for both i = 1 and i = 2. The risk of f n for each / € F(ct2, M) must then con- 
verge faster than the minimax risk over the larger parameter space F{a\,M). 
Hence such estimators must be superefficient at each / £ F{ct2,M) with re- 
spect to F(a\,M). The results in Theorem 1 can then be applied to yield 
corresponding lower bounds for adaptation over shrinking neighborhoods. 
These results are summarized in the following corollary. 

Corollary 2. Consider two function classes F(a\,M\) and F(ct2,M2) 
with ot\ < ct2- Let < xo < 1 and c n = d n n~ l ^ l+2ai \ If lim sup,^^ d n x 
(logn)" 1 = 0, then 

(17) maxlimsup?i 2Ql// ( 1+2Qi ) inf sup R(f n , f;x ,c n ) = oo. 

t=l,2 n^oo / n feF(cH,Mi) 

More specifically, suppose f n is any estimator satisfying 

(18) limsupn r sup R(f n , f;x Q ,c n ) < oo 

n^oo f£F{a 2 ,M 2 ) 

for some r > , 2 »* . 

(i) 7/limsup n ^ oo (i n -(logn)- 1 /( 1+2Ql ) = 0, then 

( n \2c*i/(l+2ai) 

(19) lim inf sup R(f n , /; x , c n ) > 0. 

n-oo Vlogny f£F(ai,Mi) 

(ii) Iflimmf n ^ OQ d n -(logn)~ 1 /( 1+2ai ^ >0 and limsupn^^ d n ■ (logn) -1 = 
0, then 

(20) li mi nfn 2ai /( 1+2Q1 )-^ sup R(f n ,/; x , c n ) > 0. 

n ^°° logn feF (ai,Mi) 
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The results in this corollary state that it is impossible to adaptively attain 
the minimax rates over the two function classes with different convergence 
rates whenever the size of the neighborhood is "too small." In Section 3 it 
is shown that the lower bounds on the cost of adaptation given by (19) and 
(20) are in fact sharp. 

3. Adaptive estimation. We now turn our attention to adaptive estima- 
tion and the construction of adaptive estimators. In this section the focus is 
on adaptation over smoothness classes for a given shrinking neighborhood. 
Wavelet thresholding estimators are constructed which attain the bounds 
given by (19) and (20). In Section 4 we shall consider adaptation to both 
smoothness and to the size of the neighborhood. 

3.1. Wavelet thresholding. Let (ft and ip be a pair of compactly supported 
father and mother wavelets which generate an orthonormal basis of L 2 [0, 1] 
through dilation and translation and where as is typical <f> is chosen to satisfy 
J 4> = 1. The support lengths of 4> and and tp are written as N<p and N^, 
respectively. 

Throughout this paper it is also assumed that ip is r-regular, meaning it 
has r vanishing moments and r continuous derivatives. Under these assump- 
tions let 



Then for some choice of jo the collection {4>j ,k, k = 1, . . . , 2 J0 ; ?pj,k,j > jo, k = 
1, . . . , 2 J } with appropriate boundary corrections is an orthonormal basis of 
L 2 [0, 1]. See Cohen, Daubechies, Jawerth and Vial (1993), Daubechies (1994) 
and Meyer (1991) for further details on wavelet bases on the unit interval 
[0, 1]. For wavelets on the line, see Daubechies (1992) and Meyer (1992). 
A function / : [0, 1] — > K can then be expanded in this orthonormal series. 



where £j k are the wavelet coefficients at the coarse level and 6j t k are coef- 
ficients at the detail levels. 

Under the orthonormal wavelet basis, the Gaussian model (2) is equivalent 
to the sequence model 



<t> jtk {t) = 2 j ' 2 4>{2h - k), ^ )fc (i) = 2^ 2 ^(2H - AO- 



Set 
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One particularly effective technique for estimating the wavelet coefficients 
is that based on block thresholding. Block thresholding estimates the wavelet 
coefficients in groups rather than individually, making simultaneous deci- 
sions to retain or to discard all the empirical coefficients within a block. 
It increases estimation accuracy by using information about neighboring 
wavelet coefficients balancing variance and bias along the curve. More de- 
tails of such adaptive smoothing can be found in Hall, Kerkyacharian and 
Picard (1998) and Cai (1999, 2002). More standard term-by-term thresh- 
olding rules can be thought of as a special case of block thresholding with 
block size one. 

The block thresholding rules used in the above-mentioned papers are con- 
structed by grouping wavelet coefficients only at the same resolution level. 
In our context it is necessary to use block thresholding rules which employ 
vertical blocking of coefficients across different resolution levels as well as the 
commonly used horizontal blocking of wavelet coefficients at the same resolu- 
tion level. We thus give below a generic description of a general block thresh- 
olding estimator which possibly uses both horizontal and vertical blocking. 

Let J > jo be some dividing resolution level. Group the wavelet coefficients 
from level jo to level J into nonoverlapping blocks of length L. Let Bi be 
the set of indices for coefficients in the ith block and let Sf = Yl(j,k)eBi Vjk 
be the sum of squares for this block. The block thresholding estimator of 
the wavelet coefficients has the form 



where rj(Sf) is some thresholding function. For example, one can take r](Sf) = 



where A is some thresholding constant. The shrinkage rule (24) is used 
throughout this paper with a variety of values of A and L. 

3.2. Adaptive estimation on given neighborhoods. The lower bound on 
the performance of an adaptive estimator over a collection of Holder classes 
!F{a,M) has been given in Corollary 2 of Section 2.3. For neighborhoods 
with c n < an estimator is given in Section 4 which adapts both to 
smoothness and to the size of neighborhood while attaining the bounds of 
Corollary 2. In this section, an estimator designed for neighborhoods with 
c n > is given. It is a wavelet estimator based on a block thresholding 
scheme. Using the same notation as in Section 3.1, let J, J* and J* be the 
smallest integers satisfying 



(23) 




for (j,k) G B u j<J, 
for j > J, 



I(Sf > A) or 



(24) 




2 J >n 



2 J * > a 



n 



-I 



and 2 J * > c n 1 logn, 
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respectively. Then in the case c n > considered here, it follows that J* < 
J. We set J* = jo when J* < jo and let 

Hj = {(j, k) : supp(V>j- k) n [x - c n , x o + c n }^0} and H* = (J 



j ■ 

J,<j<J* 



Then 



CardrJT-K/^' ■ if -?< J *> 
Card ™-\iV^ Cn , ifj>J*, 



and Card(iJ*) x logn where is the length of the support of ip. 

The estimator we propose, a hybrid estimator of soft thresholding, vertical 
block thresholding and horizontal block thresholding, can be described in 
four steps as follows. 

1. For empirical coefficients yj_k between levels jo and J* apply term-by- 
term soft thresholding rule. The soft thresholding rule is also applied to 
coefficients at levels between J* and J* where (j, k) H* , in which case 
the support of the corresponding wavelet basis function tpj^ has empty 
intersection with the interval [xo — c n , xo + c n \. 

2. Group all the empirical coefficients yj t k with (j, k) £ H* into a single 
vertical block and denote by S% = J2(j,k)eH* Vjk ^ ne sum °f squared co- 
efficients in the vertical block. Apply a single James-Stein shrinkage rule 
of the form (24) to the coefficients in this block. 

3. At each resolution level J* < j < J, divide the empirical wavelet coeffi- 
cients yj^ into nonoverlapping blocks of length L = logn. Denote by (jb) 
the set of indices of the coefficients in the bth block at level j, that is, 
(jb) ={(k:(b-l)L + l<k<bL}, and let S 2 {jb) = Eke(jb) v\k denote the 
sum of squares for the block (jb). Then apply the James-Stein shrinkage 
rule to each block (jb) for J* < j < J. 

4. For j > J, estimate all 9j t k by 0. 

More precisely, each coefficient 9j^ is estimated by 

sgn(yj, k)(\Vj, k\ ~ V2n _1 logn) + , 

if jo <j<J* and (j,k)(£H*, 



(25) e jtk 



1 _ \*Ln- 1 \ if J* < j < J and k G (jb), 

°(jb) J + 

0, ifj>J, 
where A* = 4.50524 is a constant satisfying A* — log A* — 1 = 2. Define the 
wavelet estimator of / by 

2>0 oo TP 

( 26 ) fn(x) = ^2 yjo,k4>j ,k( X ) + EE ®j,klpj,k(x) 

k=l 3=30 k=l 
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with 6j t k given in (25). This estimator attains the lower bounds in Corollary 2 
at least when c n > ^fp- For smaller neighborhoods, the estimator given by 
(32) and (33) in Section 4 also attains the lower bounds. These results are 
summarized in the following theorem. 

Theorem 3. When c n > ^fp, let f n be the estimator given by (25) and 
(26) where the wavelet ip is r-regular with r > a, whereas if c n < let 
f n be the Block JS estimator given in (32) and (33). Let < xo < 1 and 



(i) // limsiu^^oo d n ■ (logn) 1 '( 1+2a '<oo, then 

( n \2c*/(l+2c0 

(27) limsup sup R(f n , /; x , c n ) < oo. 

n~*oo VlOgn/ feF(a,M) 

(ii) 7/liminfn^oo d n - (logn) _1 /( 1+2a ) = oo and lira sup n _ >OQ d n x (log n)^ 1 - 
0, then 

(28) i im supn 2a /( 1+2a )-^ sup R(f n ,f;x ,c n ) <oo. 

n->oo lOgn f £F ^ M ) 

(hi) If liminfn^oo d n ■ (logn)^ 1 > 0, then 

(29) limsupn 2Q /( 1+2Q ) sup R(f n , /; x , c n ) < oo. 

n^oo feF(a,M) 

In view of Theorem 1, the estimator given in (25) and (26) attains the 
adaptive minimax rate for estimating / over the neighborhood [xq — c n , xq + 

C n ]- 

A particularly interesting choice of c n , c n = re -7 , is summarized in the 
following corollary which shows that fully rate optimal adaptation can be 
achieved over F(a, M) if and only if < a < 

Corollary 3. Let f n be the estimator given in (25) and (26) and let 
c n = n -7 for some < 7 < 1. Suppose the wavelet ip is chosen to be r-regular 
with r > ^ • Then for < a < ^ , 

(30) limsupn 2a /( 1+2a ) sup R(f n , /; x , c n ) < 00, 



n— >oo 



feF(a,M) 



and for < a <r, 



( n \ 2 «/(i+2a) 
(31) limsup sup R(f n , f;x ,c n ) < 00. 

n^oo VlOgn/ feF(a,M) 
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4. Adaptation over smoothness and neighborhoods. In nonparametric 
function estimation, it is common to fix a risk measure such as integrated 
squared error or squared error at a given point and to construct estimators 
which adapt across a range of smoothness classes. In our setting of shrinking 
neighborhoods, it is natural to consider two different types of adaptation. 
One is to adapt to the unknown smoothness of the underlying functions while 
the risk is measured over a given sequence of shrinking neighborhoods as in 
Section 3. A more ambitious and general adaptation goal is to adapt both 
to the unknown smoothness and the shrinking neighborhood over which the 
risk is measured. 

This latter approach is most appropriate when the goal is to construct 
spatially adaptive estimators. It gives a more complete analysis with a mul- 
tiresolution view of risk which spans a whole range of local and global mea- 
sures of risk. Ideally we would like to construct an estimator which is "fully" 
adaptive — attaining the best adaptive rates for all choices of neighborhood 
sizes. The benchmark for such estimators is provided in Theorem 1. We shall 
show below that the BlockJS estimator [Cai (1999)] is nearly fully adaptive. 
This BlockJS procedure can be described as follows. 

Expand the Gaussian process (2) in an orthonormal wavelet basis as in 
Section 3.1. At each resolution level j < J = [log 2 ?i-] divide the empirical 
wavelet coefficients yj >k into nonoverlapping blocks of length L = logn. De- 
note by (jb) the set of indices of the coefficients in the 6th block at level j, 
that is, 

(jb) = {k:(b-l)L + l<k< bL}. 

Let S?j b \ = J2ke(jb) Vjk denote the sum of squares for the block (jb) and let 
A* = 4.50524 be given as in Section 3, the root of the equation A — log A — 1 = 
2. We then apply the James-Stein shrinkage rule to each block (jb) for 
jo <j<J, 

\*Ln~ 

(32) 6 jjk = { V Sf 



(jb) 



y jjk , for k G (jb),j < J , 
for j > J. 

The BlockJS estimator f n of the whole function / is then given by 

2 j oo v 

(33) fn(x) = ^2 yjo,k4>j ,k{x) + 6j,k^j,k(x). 

k=l j=j k=l 

Theorem 4. Let f n be the BlockJS estimator given in (32) and (33) 
and let < xq < 1 and c n = d n n~ l ^ 1+2a ^ . Suppose the wavelet ip is r-regular 
with r > a. 
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(i) 7/'limsup n ^. 0O d n • (logn) 1 /( 1+2a ) < oo, then 

2a/(l+2a) 

, log n , 



(34) limsup( ^3 ) sup R(f n , f; x , c n ) < oo. 

feF( a ,M) 

(ii) 7/liminfn^oo d n ■ (logn) -1 /( 1+2a ) = oo and limsup n _^ 00 d n ■ [(logn) x 
(loglogn)] -1 = 0, then 

(35) limsupn 2a /( 1+2a )- sup R(f n , f; x , c n ) < oo. 

n^oo (logn) (log logn) f € F{a,M) 

(iii) I/liminfn^oodn • [(logn) (log logn)]" 1 > 0, then 

(36) limsupn 2a /( 1+2a ) sup R(f n , /; x , c n ) < oo. 



f£F(a,M) 



This theorem shows that the BlockJS estimator adapts well to the un- 
known smoothness across a wide range of shrinking neighborhoods. Just as 
in Section 3 the special choice c n = n -7 is particularly interesting. Although 
the results of the following corollary are similar to those given in Corollary 3 
it should be noted that the BlockJS estimator does not depend on the size 
or location of the neighborhood. Hence the BlockJS estimator exhibits very 
strong spatial and parameter space adaptivity. 



Corollary 4. Let f n be the BlockJS estimator and let < xq < 1 and 
n = n~ 7 for some 7 > 0. Suppose the wavelet tp is r -regular with r > ^r- 
Then for < a < , 

(37) limsupn 2a /( 1+2a ) sup R(f n , /; x , c n ) < 00, 

n-»oo feF(a,M) 
and for < a <r, 

( n \2a/(l+2a) 

(38) limsup sup R(f n , f; x , c n ) < 00. 

rwoo VlOgny feF(a,M) 

5. Discussion. The theory of shrinking neighborhoods gives a multires- 
olution view of the performance of function estimators. It also provides a 
useful benchmark for the evaluation of spatially adaptive procedures. This 
theory can be easily extended to more general settings. One possible exten- 
sion is to consider general weight functions. Let w(x) > be a compactly 
supported continuous function satisfying w(0) > and / w(x) dx = 1. For a 
decreasing sequence c n — > and a fixed xq G (0, 1) let 

(39) Wn ( x) = ±JZZ*\ 
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The performance of an estimator f n can then be evaluated with respect to 
the weight W n : 

(40) R(f n ,f;W n ) = E f J W n (x)(f n (x)-f(x)) 2 dx. 

This risk can be viewed as a weighted risk concentrated around the point xq, 
and the shrinking neighborhoods considered earlier in this paper correspond 
to the choice of uniform weight w(x) = < x < 1). 

Under the conditions given above, w(x) < C\ for all x, w(x) > C2 > for 
\x\ < a and w(x) = for |x| > b for some constants C±, C2, a and b. It is 
then easy to see that all the results given in the previous sections carry over 
to the risk given in (40). 

It is also possible to extend the theory in this paper to a Gaussian process 
observed on the whole line. In this setting it is natural to consider a general 
weight function W n where c n — > or c n — » 00 . The latter choice corresponds 
to expanding neighborhoods. When c n — > it is easy to see that all the 
theory given in the previous sections carries over to this setting. On the 
other hand, when c n — > 00 fully adaptive estimation is always possible and 
the block thresholding wavelet estimator given in Section 4 can easily be 
extended to a wavelet expansion on the real line. 

6. Proofs. Throughout this section, C denotes a generic positive con- 
stant which may vary from place to place. The wavelet notation follows that 
given in Section 3.1. The father wavelet <f> and mother wavelet ijj are always 
assumed to have compact support with the length of the support denoted 
by and N^, respectively. 

6.1. Preparatory results. The following elementary inequalities are useful 
for the evaluation of the risk of wavelet estimators over shrinking neighbor- 
hoods in terms of the wavelet coefficients. 

Lemma 1. For any < a < b < 1, set 

Si(a,b) = {{j,k):supp(ij)j t k)c. [a,b]} and 
S 2 (a, b) = {(j, k) : supp(^ jife ) n [a, b)^0}. 

Then 

(41) Yl e h< /Yew^wI dx ^ E e h- 

(j,fc)GSi(a,fe) Ja \j,k I (j,k)eS 2 {a,b) 

Proof. For any f(x) = Y^j,k®hk' l l ) i,k( x ) let h(x) = f(x)I[ a ^(x) and note 
that faf 2 (x)dx = J^h 2 (x)dx. Let 

9i(x)= E 8j,kipj,k( x ) and 92{x)= E 8j,ktpj,k(x). 
(j,k)eSi(a,b) (j,k)eS 2 (a,b) 



16 T. T. CAI AND M. G. LOW 

Then \\giW2 = J2(j,k)eSi{a,b) ®jk f° r * = 1» 2. It is also easy to see that gz{x) = 
h(x) for xG [a, b] and so Hfl^Hl > ll^lli an d the second inequality in (41) 
immediately follows. 
We can also write 

H X ) = Y d J^jM X ) I [a,b]{ x ) =gi(x) + Y e j,k^j,k{x)I[ a ,b](x). 
j,k (j,k)<£Si{a,b) 

Noting that supp(pi) C [a, b], it follows that 

/ 9i( x ) Y Oj,ki>j,k{z)I[a,b]{x)dx 
(j,fe)^Si(o,6) 

= / gi(x) Y 0j,k^j,k( x ) dx = o > 

and consequently 

2 

|2_ Q 2 



2 _ II _ ||2 1 
2 — ||S1||2 + 



Y e j,k^j,k(x)I[a,b](x) 
(j>k)tSi(a,b) 



>||Sl||l = Y u hk 
(j,fc)eSi(o,6) 



and the first inequality in (41) also holds. □ 



The proofs of the main theorems also rely on bounds on the risk of wavelet 
block thresholding estimators. Lemma 2 summarizes several useful risk up- 
per bounds for such estimators. 

Lemma 2. Let yi = 9i + az; L where zi iV(0, 1), i = 1, . . . , L, and let 
Oi = (l- ^)+Vi where S 2 = £y? and A > 1 . Then 

(42) £ E0i - 9 l f < min(;£0?, XLaA + 2 \e^ l ^ x ^ x ^ L a 2 . 

i=i U=i J 

In i/ie special case of A = 4.50524 (i/ie rooi 0/ i/ie equation A — log A — 3 = 0), 



(43) 



Y E0i - Oif < mini Y d l ALfj2 | + 2Ae~ V. 

j=i U=i J 



In addition, suppose A = 4.50524 and \9i\ <c for all i. Then 
(44) EiO.-e,) 2 <8c 2 + 2\e~ L a 2 . 

Proof. Inequality (42) is a direct consequence of the oracle inequality 
given in Theorem 1 of Cai (1999) and the bound on the tail probability of 
the chi-squared distribution given in Lemma 2 of Cai (1999). Inequality (43) 
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then follows directly on evaluation of (42). For the proof of (44), it suffices 
to consider the case of a = 1. In that case note that 

E0t - e t ) 2 = e{ (i - ^ yi i(s 2 > xl) - e^ 2 

< 29 2 + 2Ey 2 {\ - ^I(S 2 > XL) 

< 29 2 + 2Ey 2 I(S 2 > XL). 

Now note that for fixed 9i it is easy to check that Ey 2 I(S 2 > XL) is increasing 
in each \6j\ for j j^i. Note also that if all Ok other than 6i are fixed, then by 
Lemma 3 

(45) Ey 2 I{S 2 > XL) = E{y 2 I(y 2 > XL - S 2 ,)^) 

is increasing in \9i\. Hence Eyfl(S 2 > XL) is maximized when all 6j = c. 
When all 6j = c, Ey 2 I(S 2 > XL) is the same for all i and hence 

y 2 I(S 2 > XL) = L- l ES 2 I{S 2 > XL). 

In the proof of Proposition 1 in Cai (1999) it is shown that 

ES 2 I(S 2 > XL) < 3\\9\\ 2 2 + AL(A- 1 e A ' 1 )~ L/2 . 

Therefore, 

Ee0i - Oi) 2 < 8c 2 + 2Xe^ 1 l 2 ^ x ~ Xo ^ L . □ 

LEMMA 3. Let z ~ iV(0, 1) andy = 6 + z. Then for any c > 0, E 9 y 2 I(\y\ > c) 
is an increasing function of \6\. 

PROOF. It suffices to consider 6 > 0. Let 

f{6) = V^E e y 2 I{\y\ > c) = QH + /"JyV^K^) 2 dy 



and 



Then 



/ rco p—c—6\ 

g{6) = V2^E e z 2 I(\y\ > c) = ( / + / z 2 e~^ z clz. 

\Jc-e J-oo / 

fV) = (£° + J~y 2 {y-e)e-^y-^ 2 dy 

, ,00 + ,-c-0X ^ + + e2x)e _ {1/2)x 2 ^ 

\Jc—9 J—oo / 
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and 

g >(0) = ( c -0) 2 e -(V2)(c-0) 2 _ {c + e f e -{i/2){c + e)\ 

Note that 

— c— 9 „ roo _ 

(x 3 + 6 2 x)e-W x dx = - (x 3 + e 2 x)e~W x dx, 
oo Jc+e 

so for 9 > 

(/■oo /•— c— 0\ ~ rc+8 „ 

/ + / ) 2fe 2 e-( 1 / 2 ) a; + / (x 3 + 2 x)e-( 1 /2)- 2 d;r > 
./c-0 J-oo / Jc-6» 

and so the lemma follows. □ 

The following lemma is a result from standard wavelet theory. See, for 
example, Daubechies (1992). 

Lemma 4. Suppose the wavelet ip has compact support and is r-regular 
with r > a. Then there exists a constant C > such that for all f G F(a, M) 
its wavelet coefficients satisfy 

(46) |0j- fc | < C2^' ((1/2)+a) for all j > j and 1 < k < 7?. 

6.2. Proof of the main results. 

Proof of Theorem 1. The proof of this theorem is divided into two 
parts. In the first part lower bounds are given and in the second upper 
bounds. For the lower bounds only the first two cases in the theorem need 
to be considered. Since the proofs of these two cases are similar a proof of 
case (i) is given in detail and then only the main changes needed for the 
proof of case (ii) are given. 

Lower bounds. 

Case (i) . Let g : M. — > K be a function satisfying: 

(i) g(x) = A > for x G [— 1,1] and g is compactly supported in the 
interval [—A, A]; 

(ii) \g( k \x) — g( k \y)\ < (M — M')\x — y\ a ~ k , — oo < x < y < oo where k 
is the greatest integer less than or equal to a; 

(iii) f* A g\x)dx = l. 

For sufficiently large A such a function is easy to construct. 
Set 

/ n W(l+2a) ( n \l/(l+2a) 

7n = ; TT and Pn 
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and note that 



log # n 



Let / n , e : [0, 1] -> M be denned by 

/n,0(^) = 61 • ln l 9{Pn{x - Xq)) + f (x) fov 6 = 0,1. 

It is simple to check that for 6 = or 1, / ni g G F(a,M) for all n. Note also 
that for sufficiently large n, say n > Nq, 

(47) Pn = n [ (/n.l -/n,o) 2 = logS„, 

JO 

Write Pq for the probability measure associated with the process 

Z*{t) = f f nfi {x) dx + -^=B*(t), 0<t<l. 
Jo V n 

A sufficient statistic for the family of measures {Pq, P" } is then given by 

dP n 

the log likelihood ratio T n = In , and for n > iVo , 



under P re ,T n ~iV I -^,p n 



and 



under Pf, T n ~ iV (y,Pn)- 

Now based on the Gaussian model (2) let f n be an estimator of /. Decompose 
this estimator into components 

(48) f n {x) = f n>0 (x) + 6{f n ,l{x) ~ f n , Q (x)) + h n (x), 

where 

fXQ+C n A 

(49) / Maj)(/n,i(^)-/n.o(a;))dx = 0. 

Jx -c n 

Hence, for 6 = or 1, 

^ fXo+Cn „ 2 

(/n(^) - fn,e(x)) dx 



2c n jxq—c 



(50) >(0-^) 2 ^ r +CT \fn,l(x) ~ fnfl{x)f dx 



2c n 

l Pn l ln 2 ^T I ^ 9 2 {x)dx. 

^C-n J —0n.Cn 
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It follows from the condition lim sup^^^ d n ■ (log-B n )~ 1 ^ 1+2a ^ = that for 
sufficiently large n, say n > N\, 

d n <(\ogB n fl^\ 

in which case (3 n c n < 1. So 

(51) — / {fn(x) - fnfiix)) dx>{6-6) A I 

If assumption (5) of the theorem holds, there exist a C\ < oo and N2 such 
that for all n > N2 , 

R(fn, /n, ; X , Cn) < C\n" 2a ^ 1+2 ^ B' 1 . 

Hence 

E fn>0 (§ - 0) 2 < C^B" 1 (log B n y 2a ^ 1+2a l 

Since T n is sufficient for {Pq,Pi} apply Theorem 1 of Brown and Low 
(1996) with I = e pn = B n for n > N . 

Let N = max(iVo, N\, A^)- Theorem 1, equation (2.4), of Brown and Low 
(1996) then yields for n> N 

(52) E fnl {6-1) 2 > l-2C 1 1 /2 \- 1 (\ogB n )- a /( 1+2a l 
Combining (51) and (52) yields (6). 

Case (ii). Let the function g be constructed similarly as in the proof 

of case (i), except that in Condition (ii), M — M' is replaced by (M — 
M /)(d)(i/2)+a get 

fd\ l l 2 f n Y /{l+2a) 

^={a) {^bJ and 



A( n \ 1 /(l+2«) 



V log B, J 



Since liminfn^oo d n ■ (logB n ) 1 /( 1 + 2a ) > 0, there exist constants d > and 
N > such that for all n > N, 

d n >d(logB n ) 1 ^ 1+2a l 

Hence, for n> N , 



Pnll = lo " B , Pnln 1 = J and (3 n c n > A. 

In this case (50) yields 



/A\(l/2)+c 



/ " /n, 9 (x)) 2 dx > -(§ - )2 n -2a/(l+2a) . 
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The remaining steps are the same as in the proof of Case (i) and hence are 
omitted. 

We now turn to the proof of upper bounds, where the three cases need 
to be treated separately. Note, however, that in each case we may assume 
without loss of generality that /o = since we can always recenter the es- 
timate at any given /o. Let {<fi,ip} be a pair of compactly supported father 
and mother wavelets generating an orthonormal basis in L 2 [0, 1] where the 
support lengths of (p and tp are denoted by N<p and N^, respectively. We 
assume that both <f) and ip have r > a vanishing moments, / x k (j)(x) dx = 
for k = 1, . . . , r and / x k ip{x) dx = for k = 0, 1, . . . , r. For example, Coiflets 
of order greater than a have this property. See Daubechies (1992). 

Upper bounds. 

Case (i). Let j n be the largest integer satisfying 2 Jn < ( log n Bn ) 1 /( 1+2a ) . 

For j > and 1 < k < 2 j let (j)j, k (t) = 2 j / 2 <p(2 j t - k). Then x G supp(0 j?lifc ) 
for some k. Write 



y n ^2^' 2 J 4> jn>k (t)dZ* n {t) 

= 2^/2 f f(t)fa tk (t)dt + 2>»/ 2 n-V* I ^ k {t)dW(t) 



= f + z. 

Here z is a Gaussian random variable with mean and variance a 2 = 2 J "n~ 1 , 
and / can be regarded as the "mean value" of / on the support of 4>j n , k - Set 

5 n = sgn(y n )(\y n \ - a n (2logB n ) 1/2 ) + 

and let f n be an estimator of / with 

f n {x) = 5 n for all x £ [x - c n , x + c n ]. 

We show below that f n satisfies both f n G A(/o) and (7). First, it is easy to 
verify directly that 

E(S n - f) 2 < min(2(/) 2 , a 2 n (l + 21ogB n )) + a 2 n B~ x 



and hence 



R(fn,f;X ,Cn) 



(53) 



1 rx +c n 

= —E f / {5 n -f{x)fdx 

^Cn Jx —c n 

<7rr (f-f(x)fdx 



+ min(2(/) 2 , a 2 (l + 21og£ n )) + a 2 n B~\ 
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Now for /o = the zero function on [0, 1] the first two terms in (53) are both 
0. Hence 

R(f n J -,xo,c n )<n-^ + ^B~\lo g B n r 1/{1+2a) 

and it follows that f n G A(/o). It follows from the vanishing moments prop- 
erty of (j> that for all / G F(a, M) and for all x £ [xq — c n , Xo + c n ], 

(54) \f(x)-f\<C(M, ( p)2- a ^, 

where C(M, t/>) is a constant depending on M and (j) only. Now (7) follows 
by applying (54) to (53): 

fl(/n, /; X , Cn) < C \^ JL ) (1 + o(l)). 

Case (ii) . In the second wavelet procedure based on block thresh- 

olding is used. Let J\ and J2 be the largest integers satisfying 

2 Jl < d" 1 ™ 1 A1+2Q ) and 2 j2 < d' 1 log £? n n 1/(1+2a) . 

Let 

Hj = {(j, &) : supp(V'j,fc) n [xq - c n , x + Cn] 7^ 0} and = |J iTj. 

Ji<i<J 2 

Then it is easy to check that for j > Ji the cardinality of the index sets 
Hj is of order 2- ? c n and so L n = Card^*) = b n logB n with 6* < 6 n < 6* 
for some positive constants 6* and b* . Denote by S 2 the sum of all the 
squared empirical wavelet coefficients y^j, with indices in H*. Applying a 
block thresholding rule to the coefficients, 

Qj,k =(l — ) for all (j, k) £H*. 

Then it follows from Lemma 2 that 



E E (hk-0j 



k 



\2 



(j,k)eH, 

(55) 



< min 



( Yl ^ fc ,AL n n^ 1 ")+2n- 1 e -( 1 /2)(A-io g A-i)L n _ 



Let the thresholding constant A be chosen such that 

±(A-logA-l)6* = l. 
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Then the second term in the right-hand side of (55) is bounded from above 
by 2n~ l B~ l . Applying Lemma 1, we have 

R(fn,f;xo,c n ) 

<i~ E £(^-M 2 + ^E E 

ZCn (j,k)cH» ZCn j>J 2 V,k)eHj 



(56) 



" 3>-h (j,k)eHj 

For / = /o = 0, the first and the third terms in (56) are both 0, hence 

R(L /o; io, on) < n-^+^B- 1 (log B n y !/(!+*») 
and so / n G A(/o). For / G i^c^-M), it follows from Lemma 4 that 
(57) |0 ijfc | < C2~ j « 1 W +a '> 

with the constant C not depending on /. Hence 

^E E ^<^E^ Cn 2-^+-) 

(58) 

= Cn- 2Q /( 1+2Q )d- 1 log J B n . 
Now (9) follows from (56) and (58). 

Case (iii) . Finally, we turn to the third case where we will use the same 
notation as in Case (ii). Let J2 and J3 be the largest integers satisfying 
2 Ja < d~ 1 log J B n n 1 /( 1+2Q ) and 2' h < n 1 /(i+2«) ) respectively. (If d n < \ogB ni 
choose J3 = J2.) Denote by Lj the cardinality of the index sets Hj. Then, 
for j > J<2, there exist positive constants 6* and b* such that 6*2 J c n < Lj < 
b*2^c n . Denote by the sum of all the squared empirical wavelet coefficients 
yj t h at level j with (j,k) G Hj. Applying a block thresholding rule to the 
coefficients level by level, 

(XL \ 
1 ^2 — J y jik for all J 2 < j < J3 and (j, k) G Hj. 

Then again it follows from Lemma 2 that 
E E0j,k — 9j,k) 2 

U,k)GH 3 

(59) 

<min( Y, 0l k ,\L,n~ 1 )+2n- 1 e^ 1 /^ x ^ x ~ 1 ^. 
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Write Lj 2 =b n log B n with 6* < b n < b* . We choose the thresholding constant 
A such that 

±(A-logA-l)6* = l. 

Then the second term on the right-hand side of (59) is bounded from above 
by 2n~ 1 B~ 1 for j = J2 and 

(60) ]T 4ra -i e -(i/2)(A-logA-i)i, < 4n -i B -i. 

j=J2 

Lemma 1 yields 

R(fn,f;x ,c n ) 



(61) 



J 3 



1 J3 / 

<^-E min E e lk^ L j n 1 



+ 2n - 2Q /(i +2Q ) B -i d -i + 1 E ^ Q 2 k 

''" 3 >J*(j,k)< II 

Once again for / = /o = 0, the first and the third terms in (61) are both 0, 
hence 

R(Lfo;x ,c n ) < 2n- 2a ^ l+2 ^B- 1 d- 1 
and so /„ £ A(/o). The coefficient bound (57) yields 

(62) E ^<^-E E Cc„2-^ = C7n- 2 «/(^). 

ZCri j> J 3 (j,k)eHj ZCn j>J 3 (j,fc)e« J - 

Now (10) follows from (61) and (62). □ 

Proof of Theorem 2. As in the proof of Theorem 1 it suffices to 
consider /o = 0. Expand the Gaussian process (2) in an orthonormal wavelet 
basis as in Section 3.1. Suppose the wavelet tp is chosen to be r-regular with 
r > a. Let J' be the largest integer satisfying 2 J < n l /^ l+2a \ Then the total 
number L' of wavelet coefficients up to (and including) the level J' is less 
than 2ro 1 /( 1+2a ) and larger than or equal to ?i 1 /( 1+2a ). Group all the empirical 
wavelet coefficients Vj 0t k an d yj s k up to the level J' into a single block and 
apply a James-Stein type rule to the coefficients. More specifically, denote 
the sum of the squared empirical coefficients up to the level J' by 

2-70 J' 21 

s2 = Y.yLk + E E^ 

fc=l 3=30 fc=l 
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and define the estimator of the wavelet coefficients by 



( XL'n 

=(i- —g3-)yj ,k for 1 < k < 2*>, 

(63) jik = \1- — ^— J y 3M for j < J', 1 < k < 2?, 

6j t k = otherwise, 

where A is a constant satisfying A — log A — 1 = 2D. The corresponding esti- 
mator f n of / is the wavelet series with £j k and 9j t k as coefficients: 

2m oo 2 j 

(64) f n (x) = ij ,k^j ,k{x) + E dj,k^jA x )- 

k=l j=jo k=l 

It follows from (42) in Lemma 2 that 

J' 

E E (£jo,k ~ €j ,k) + E E _ e hkf 

k j=jo k 

<^(E4,*+ E E^.*.^ , »" 1 ) +2n- 1 e-( 1 / 2 )(A-io g A-i)L' 

^(E&* + E E^' 2 ^" 2a/(1+2a) ) +2n- 1 e-^ 1/(1+2a) . 
\ fc j=io fc / 

Hence 

^/ll/n-ZllI 

J' 



E ^(4o,fc - ^o,fc) 2 + E E ~ ^.fc) 2 + E E p2 



k j=jo k j=J'+l k 

( 65 ) / J' X 

E4*+EE^.2An-^ 1+2a M 

\ fc j'=j'o / 



oo 

.1 _ Dn l/(l + 2a) 



+ 2n- 1 e-^ i/(i+ ^+ £ Y°h- 
j=j'+i k 

Now for /o = 0, all ^ 0j fc = and all 0^ = so 

Thus, with J B n = ne D " 1/C1+2ra) , 

limsupfl n %j/ n -/ ||I<oo and lim n 1 ^ 1+2a ^(logS n ) -1 = 1?. 
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On the other hand, the estimator attains the optimal rate uniformly over 
F(a,M). This can be seen easily from (46) and (65): 

sup E f \\f n -f\\l 

f€F( a ,M) 



oo 2-? 

t -2a/(l+2a) + 2 „-l ^Dn^+^ + X X C 2 2~^ 1+2 ^ 

j=J'+lk=l 



+ 2n~V 
< 2(A + C 2 )n" 2a /( 1+2a ) (1 + o(l)). 



□ 



Proof of Theorem 3. We assume J* < J in the following proof. In 
the special case of J* > J the estimator is the BlockJS estimator. The proof 
for this case follows from that of Theorem 4. Denote by I n (x) = I(x G \xq — 
c n ,x + c n ]). Then 



XOiCn) 



2c n Jo 



Xfeo,fc ~ &o,*)0io.*( a 



1 2 



j= jo k 

<2^e{- f 1 ^'--. - -^ 2 



I n (x) da; 



[ c n Jo 

+ E \ L f 

C n JO 



E&'o,* - €jo,k) 4>} ,k{x)In{x) dx 



1 2 



■3=30 k 



I n (x) dx 



<Cn- l + E\ — f 1 

C n JO 



■3=30 k 



I n (x) dx 



Hence 



(66) 



R(fn,f;x ,c n ) 

<Cn' 1 + 2^1100^ X - 1 "j-I- "!■>■■ ) 

\j'=io (jttyeHj J 

+ e^£(^£ E&k - hk)^3,kW n {x)^ dx 
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J, 



<Cn~ l + 2 



(t E ^/\E{e hk -e hk f)A 



— E E E 03,k-hk) 2 - 
Ln j>J, (j,k)&Hj 



The last inequality follows from Lemma 1 and the elementary inequality 

/ n \ 2 / n \ 2 



,j=l / \i=l 



We now consider the three cases separately. The main tool is the risk 
bounds (43) and (44) given in Lemma 2. Note that with a 2 = n~ l and 
L = logn the second term on the right-hand side of (43) and (44) is 2 An -2 , 
which is negligible in the following risk calculations, and we will absorb this 
term into the first term, Cre -1 , in the calculations below. Note that 



E E E{e hk -e hk f<Cm:J{\ogn)n-\ ]T el k )+o( 



n 2 ) 



In case (i), let Jo be the smallest integer satisfying 2 J ° > (i^^) 1/ ^ 1+2a ' ) ■ 
Then Jo < J*. It follows from Lemmas 2 and 4 that 

R(fn,f;x ,c n ) 

< C 2^/ 2 (log n) 1 / 2 ^ 1 / 2 ) + 2 ^ 2 2-^W +a A 
\j=jo j=Jo / 

oo 

+ Cc^ J2 ^ j c n 2~ j{1+2a) 
j=J* 

<c (]ogn^ 2a/{1+2a) 
~ \ n 

In case (ii), Lemmas 2 and 4 yield that 

R(L f; x ,c n )<C'( g 2?' 2 (log nfl 2 n-^ 
\ i=io / 



+ Cc" 1 (log n)n" l + Cc- 1 £ 2 J c n 2- J ( 1+2a ) 
< c l -^n~ 2a ^ 1+2a \ 
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In case (iii) let J\ be the smallest integer satisfying 2 1 

>n i/(i+2a)_ Wehave 
R(Lf;x ,c n )<c( £ 2^' /2 (logn) 1 / 2 n-( 1 / 2 ) N ) + Cc" 1 (log n)^ 1 

V 3=30 ' 

Jl ~ X 03n 00 

■ t, logn r— ' 

< Cn -2a/(l+2a)^ Q 

Proof of Theorem 4. Let f n (x) be the BlockJS estimator given in 
(33). Denote I n (x) = I(x £ [xq — c n ,xo + c n ]). Similarly as in (66), in the 
proof of Theorem 3, for any T > jo, 

R(Lf;xo,Cn)<Cn- 1 + 2U\\ 00 (Y, E ^/ 2 (E(6 jtk -6 jtk ) 2 ) 1/2 ) 
(67) 2 

+ ~E E E(e jtk -e jtk ) 2 . 

n 3>T(j,k)£H 3 

Denote by Jj, i = 0, 1,2,3,4, the smallest integers satisfying 



l/(l+2a) n V(l+2a) 

2 J o > 1 __ ) 2 Jl > 



, log n 

2 Ja > n 1 /( 1 + 2 ") 



logn 



2j3> n 1 ^logn 2 , 4>n V(l + -). 
Then for all j < J\, 

Note that for all levels j < J3 , the coefficients of wavelet basis functions ^ ^ 
whose support has nonempty intersection with the interval [xo — c n , xo + c n ] 
are in at most + 1 blocks because the number of such coefficients is less 
than Nj/) logn. 

We will consider the three cases separately. Again, with a 2 = n~ 1 and 
L = logn, the second term on the right-hand side of (44) and (43) is 2Xn~ 2 , 
which is negligible and thus will be absorbed into the Cn -1 term in the 
calculations below. 
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(i) Choose T = J\ in (67). In this case Jq < J\. It then follows from 
Lemmas 2 and 4 that 

R(fn,f;x ,Cn) 

< Cn" 1 + c( J ^T 2^ /2 (log n) l ' 2 n-W + ^ 2 ^2 2 ^((i/2)+«) N \ * 

V 3=30 3=Jo ' 

oo 

+ c- 1 $> Cn 2-^+ 2 «) 

3=Ji 
• logn x 2a/(l+2a) 

71 / 

(ii) Choose T = J\ in (67). Lemmas 2 and 4 yield that 



<C 



R(fn,f; 

/Ji-1 \ 2 Ja-l 

<Cn- l + ci Y, 2- j/2 (logn) 1 / 2 n-( 1 / 2 ) + Cc" 1 £ (logn)n" 1 
\ j'=io / j=Ji 



oo 



+ Cc- 1 Y 2 j c„2-^ 1+2a ) 

3=Jl 

< Cri -i + c ^»- 2a /( 1+2Q ) 

+ C( j 2 _ j^^^-WCi+aa) + c ^ n -2a/(i +2a ) 
= c (logn)(loglogn) n _ 2a/(1+2a)(i + 

(iii) Choose T = J3 in (67). In this case we have 

R(fn,f; 

/Ji-1 \ 2 J3-I 

^Cn^ + Ci Y 2j/2 ( l °g n ) 1/2n ~ 1/2 ) + Cc n l J2 ( lo § n ) n ~ l 
\j=jo / j=Ji 

+ £ f^Oogn^ + Cc- 1 £ 2^c n 2-^ 1 + 2 «) 

r— ; logn f—i 

3=J3 3=Ja 

<C n -i + C^„-2a/(i+2«) 

d n 

+ C(J 3 - J 1 )i^n- 2a /( 1+2Q ) + Cn" 2a /( 1+2a ) + Cn" 2Q /( 1+2Q ) 
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= Cn~ 2a ^ 1+2a \l + o{l)). □ 
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